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ABSTRACT 

In this paper we propose a method for automatic local time adap- 
tation of the spectrogram of an audio signal, based on its decom- 
position within a Gabor multi-frame. The sparsity of the analyses 
within each individual frame is evaluated through the Renyi en- 
tropies measures. According to the sparsity of the decompositions, 
an optimal resolution and a reduced multi-frame are determined, 
defining an adapted spectrogram with variable resolution and hop 
size. 

The composition of such a reduced multi-frame allows an imme- 
diate definition of a dual frame: re-synthesis techniques for this 
adapted analysis are easily derived by the traditional phase vocoder 
scheme. 

1. INTRODUCTION 

The quality of analysis and synthesis processes based on time- 
frequency transforms is highly affected by the frames used for the 
decomposition and the reconstruction of the signal. Traditional 
methods based on single frames of atomic functions have impor- 
tant limits: a Gabor frame imposes a fixed resolution over all the 
time-frequency plane, while a wavelet frame gives a strictly deter- 
mined variation of the resolution: moreover, the user is frequently 
asked to define himself the analysis window features, which is not 
always a simple task even for normally experienced users. 
The resolution of such analysis methods is linked to the time and 
frequency concentration of the basic functions involved in the de- 
composition. Frame Theory (| 1|,[2],[3|) extends the concept of 
orthonormal basis in a Hilbert space H: in our domain, it gives 
a unified model for the description of decomposing systems based 
on atomic functions. The set {<^ 7 } 7S r is & frame for H if there ex- 
ist two positive non zero constants A and B, called frame bounds, 
such that for all / € H, 

A\\ff <J2\(f>^)\ 2 < B \\ff ■ (D 
7er 
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The time-frequency concentration of an atom 4>-y in a frame can 
be represented through its associated Heisenberg box: it is a rect- 
angle drawn in the time-frequency plane whose dimensions are 
linked respectively to the time spread of a function and to the fre- 
quency spread of its Fourier Transform. In the Short Time Fourier 
Transform, the boxes associated to the transpositions of the win- 
dow function g have fixed dimensions in every area of the time- 
frequency plane: the resolution is the same for all the components 
of the signal. In the Wavelet Transform, lower frequency compo- 
nents are represented with a higher time resolution, while a higher 
frequency resolution is given for the higher frequency ones. This 
limits are not motivated when analyzing a sound without an a priori 
knowledge of its features, as the best resolution tradeoff is neither 
unique nor depending only on a single variable. It is therefore use- 
ful to search for adaptive methods of sound analysis and synthesis, 
and for algorithms whose operations are designed to change lo- 
cally according to the analyzed signal features. 
Given I G R + , the analysis resolution can be globally modified 
with a scaling operation 

which has the effect of changing the ratio between the edges of 
the Heisenberg box associated to g while preserving its area: this 
means that the global time-frequency resolution is modified by 
privileging concentration in one dimension to the detriment of the 
other. The idea which has lead to the definition of multiple Ga- 
bor frames ([4 1) is to consider a decomposing system where all 
these different resolution tradeoffs coexist, providing a more de- 
tailed description of the signal. The drawback is the introduction 
of a high redundancy which lowers the readability of the repre- 
sentation: therefore methods for appropriate reductions of these 
multiple frames are needed, typically using sparsity criteria. 
A promising approach (| 5 1) takes into account Renyi entropies, a 
generalization of the Shannon entropy: given a unit-energy signal 
/ 6 L 2 (R) and a time-frequency representation $/ (u, f ) of / the 
Renyi entropy of the representation is defined for an order a > 
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as follows 



log. 



jj *?(u,£)diide. 



(3) 



In this paper, the time-frequency representation $/(it,£) consid- 
ered is the spectrogram, as detailed in the next section. The appli- 
cation to our problem is related to the concept that minimizing the 
complexity or information over a set of time-frequency represen- 
tations of a same signal is equivalent to maximizing the concentra- 
tion and peakiness of the analysis, thus selecting the best resolution 
tradeoff: a sparsity measure can consequently be defined through 
an information measure. Methods inspired by this approach have 
shown to give interesting results both analytically and numerically 
((D)- 

The proposed method of local time adaptation improves on 
the analysis multi-frame definition: the user can specify a finite 
arbitrary set of positive scaling factors L C R + corresponding 
to the resolutions available; then the algorithm composes differ- 
ent frames {g l n .k}(„,k)ez 2 with ' £ L an d g l as in l|2j, and a 
multiple Gabor frame is obtained as the union of all the given 
frames. The main improvement in comparison with [7 1 is that we 
are not obliged to keep the same hop size within the individual 
frames analyses, thus avoiding unnecessary short hops for larger 
windows: our method employs frames which share the same re- 
dundancy, so that every analysis has the same overlap, with a sig- 
nificant gain in computational cost. 

The limit of our approach in comparison with |6| is that we ap- 
ply the entropy evaluation on the whole frequency dimension, thus 
providing analyses which are adapted only in the time dimension. 
On the other hand, the reduced multi-frame obtained with our 
method allows a perfect reconstruction of the signal which is not 
provided by (6): in our scheme, for any analysis segment a sin- 
gle original frame is retained; therefore, a re-synthesis technique 
can be defined as a straightforward extension of the least square 
error estimation from the modified STFT presented in [8]. So our 
method can easily be used to provide common time-frequency pro- 
cessing frameworks with an adaptive analysis technique. 

2. ENTROPY EVALUATION OF A SPECTROGRAM 

We will now describe the application of the entropy sparsity mea- 
sure on the spectrogram distribution. We will focus on discretized 
spectrograms, as dealing with digital signal processing requires to 
work with sampled signals and distributions, even if for the most 
part the results can be extended to the continuous case. 
A Gabor frame is obtained by time shifting and frequency trans- 
posing a window function g according to a regular lattice. Given 
a time step a and a frequency step b we write {u„}„gz = an 
and {£fc}fcgz = bk; these two sequences generate the nodes of the 
time-frequency lattice for the frame {g n ,k}(n,k)ez 2 defined as 



9n,k{t) = g(t - u„)e 



2iri^ k t 



(4) 



the nodes are the centers of the Heisenberg boxes associated to 
the windows in the frame. The decomposition of a function / G 
L 2 (R) in a Gabor frame is simply a sampling of its STFT accord- 
ing to such a lattice, 



Sf[n,k] = (f,g n ,k) = I f(t)g(t-u n )e 



(If . 



(5) 



and the squared modulus of this decomposition is the discretized 
spectrogram, 

PS / [n,fc] = |S/[n,fc]| 2 . (6) 

Given a discrete spectrogram with time step a and frequency step 
b as in |6}, we look for an evaluation of its entropy over a certain 
rectangle of the time-frequency plane [fi,t2] x [^1,^2] Q R 2 . 
The rectangle identifies a sequence of points GCZ 2 where G = 
{(n, k) G Z 2 : t\ < na < £2, v\ < kb < u 2 }. Through an 
appropriate normalization we obtain the sequence 



fc] = 



PS f [n,k] 



[n',fc']6G 



PSf 



■ k' 



(T) 



with [n, k] G G, which can be seen as a discrete probability den- 
sity. As a discretization of the original continuous spectrogram, 
every sample in PSf is related to a time-frequency region of area 
ab; we thus obtain the Renyi entropy measure for <[7j directly from 



(PS/ ) 



1 



log 2 J2 (PSf[n,k]) a + log 2 (ab) . (8) 



[n,fc]GG 



General properties of Renyi entropies can be found in (5), 1101 
and 1111 : we recall in particular those which have a closer relation 
with our problem. It is easy to show that for every finite discrete 
probability density P the entropy H a (P) tends to coincide with 
the Shannon entropy of P as the order a tends to one. Moreover, 
Ha (P) is a non increasing function of a, so 



ai < a 2 => H ai {P) > H a2 (P) 



(9) 



As we are working with finite discrete densities we can also con- 
sider the case a — which is simply the logarithm of the number 
of elements in P; as a consequence Hq(P) > H a (P) for every 
admissible order a. 

A third basic fact is that for every order a the Renyi entropy H a is 
maximum when P is uniformly distributed, while it is minimum 
and equal to zero when P has a single non-zero value. Given a 
generic P and its entropy H a (P) for a certain order a, we have 
that for any ft > a 



■ - 1 8 — 

a p 



HpiP) 



(10) 



All of these results give useful informations on the values of dif- 
ferent measures on a single density P as in <j8j, while the relations 
between the entropies of two different densities P and Q are in 
general hard to determine analytically; in our problem, P and Q 
are two spectrograms of a same signal in a same time-frequency 
area, based on two window functions with different scaling as in 

When the spectrogram of a signal does not depend on time it is eas- 
ier to find such a relation, and it turns out to be the one expected: 
let PS^ 5 be the sampled spectrogram of a sinusoid s over the re- 
gion G with a window function h of compact support; then PS^" 
is simply a translation in the frequency domain of h, the Fourier 
transform of the window, and it is therefore time-independent. We 
choose a bounded set L of admissible scaling factors, so that the 
discretized support of the scaled windows h l still remains inside 
G for any Z G L. It is not hard to prove that the entropy of a 
spectrogram taken with such a scaled version of h is given by 



H^PSfi) = H a (PSf) 



log 2 I 



(11) 
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The sparsity measure we are using looks for the window which 
minimizes the entropy measure: we deduce from \\\\ that it is the 
one obtained with the largest scaling factor available, so with the 
largest time-support. This is coherent with our expectation as sta- 
tionary signals, such as sinusoids, are best analyzed with a high 
frequency resolution, because time-independency allows a small 
time resolution. Moreover, this is true for any order a used for the 
entropy calculus. 

Symmetric considerations apply whenever the spectrogram of a 
signal does not depend on frequency, as for impulses. 

A last remark regards the dependency of ([8J on the time and 
frequency step a and b used for the discretization of the spectro- 
gram. When considering signals as finite vectors, a signal and its 
Fourier Transform have the same length. Therefore in the STFT 
the window length determines the number frequency points, while 
the sampling rate sets frequency values: the definition of b is thus 
implicit in the window choice. Actually, the FFT algorithm allows 
to ask a number of frequency points larger than the signal length: 
further frequency values are obtained as an interpolation between 
the original ones by properly adding zero values to the signal. If 
the sampling rate is fix, such a procedure establishes smaller b as 
a consequence of a larger number of frequency points. We have 
numerically verified that such a variation of b has no impact on 
the entropy calculus, so that the FFT size can be set according to 
implementation needs. 

Regarding the time step a, we are working on the analytical demon- 
stration of a largely verified evidence: as long as the decomposing 
system is a frame the entropy measure is invariant to redundancy 
variation, so the choice of a can be ruled by considerations on the 
invertibility of the decomposing frame without losing coherence 
between the information measure of the different analyses. This is 
a key point, as it states that the sparsity measure obtained allows a 
total independence between the hop sizes of the different analyses: 
with the implementation of proper structures to handle multi-hop 
STFTs we have obtained a more efficient algorithm in comparison 
with the ones imposing a fixed hop size, as (7). 

3. ALGORITHM AND EXAMPLES 

We now summarize the main operations of the algorithm we have 
developed providing examples of its application. For spectrograms 
calculation we have used a Harming window 

h(t) =cos 2 (7rt)x [ _i,i ] , (12) 

with \ the indicator function of the specified interval, but it is ob- 
viously possible to generalize the results thus obtained to the entire 
class of compactly supported window functions. We create a mul- 
tiple Gabor frame as in l|4j using as mother functions some scaled 
version of h, obtained as in (2]l with a finite set of positive real 
scaling factors L. 

Different spectrograms of segments of the signal are calculated 
with each one of the above frames: the length of the analysis seg- 
ment and the overlap between two consecutive segments are given 
as parameters. 

The different frames composing the multi-frame have the same 
frequency step b but different time steps {a; : I £ L}: the small- 
est and largest window sizes are given as parameters together with 
\L\, the number of different windows composing the multi-frame, 



and the global overlap needed for the analyses. The algorithm fixes 
the intermediate sizes so that for each signal segment the different 
frames have the same overlap between consecutive windows, and 
so the same redundancy. This generates an irregular time disposi- 
tion of the multi-frame elements in each signal segment, as illus- 
trated in figure[T| Such a disposition causes a different influence of 
the boundary parts of the signal on the different frames analyses: 
the beginning and the end of the signal segment have a higher en- 
ergy when windowed in the smaller frames. This is avoided with 
a preliminary weighting: the beginning and the end of each sig- 
nal segment are windowed respectively with the first and second 
half of the largest analysis window. Such a weighting does not 
concern the decomposition for re-synthesis purpose, but only the 
analyses used for entropy evaluations. For each signal segment we 
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Figure 1 : An analysis segment: time locations of the Heisenberg 
boxes associated to the multi-frame used in our algorithm. 

calculate the entropy of every spectrogram as in j8|, where G is 
the rectangle with the time segment analyzed as horizontal dimen- 
sion and the whole frequency lattice as vertical. The sparsest local 
analysis is defined to be the one with minimum Renyi entropy: the 
best window is thus defined consequently. Adaptation is obtained 
over the time dimension as for every signal segment the selected 
analysis involve the whole frequency dimension. An interpolation 
is performed over the overlapping zones to avoid abrupt disconti- 
nuities in the tradeoff of the resolutions. 

The time adapted analysis of the global signal is finally realized 
by opportunely assembling the slices of local sparsest analyses ob- 
tained with the selected windows. 

In figure[3]we give a first example of an adaptive analysis per- 
formed by our algorithm with eight Hanning windows of different 
sizes on a real instrumental sound, a B4 note played by a marimba: 
this sound combines the need for a good time resolution at the mo- 
ment of the percussion, with that of a good frequency resolution 
on the harmonic resonance of the instrument. This is fully pro- 
vided by the algorithm, as shown in the adapted spectrogram at 
the bottom of figure [3] Moreover, we see that the pre-echo of the 
analysis at the bottom of figure [2] is completely removed in the 
adapted spectrogram. 

In figure[5]we give a second example with a synthetic sound, a 
sinusoid with sinusoidal frequency modulation: as figure|4]shows, 
a small window is best adapted where the frequency variation is 
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Figure 2: Two different spectrograms of a B4 note played by a marimba, with Manning windows of sizes 512 (top) and 4096 (bottom) 
samples. 



fast compared to the window length; on the other hand, the largest 
window is better where the signal is almost stationary. 

The re-synthesis method introduced in |8| gives a perfect re- 
construction of the signal as a weighted expansion of the coeffi- 
cients of its STFT in the original analysis frame. Let Sf[n, k] be 
the STFT of a signal / with window function h and time step a; 
fixing n, through an iFFT we have a windowed segment of / 



f h (n, I) = h(na - 0/(0 , 



(13) 



whose time location depends on n. An immediate perfect recon- 
struction of / is given by 



/(0 = 



2~2n=-oa h ( na -l)fh(n,l) 

E^r_ oo fc a (na-0 



(14) 



We extend the same technique using a variable window h and time 
step a according to the composition of the reduced multi-frame, 
obtaining a perfect reconstruction as well. The interest of ( |14[ i is 
that the given distribution needs not to be the STFT of a signal: for 
example, a transformation S* [n, k] of the STFT of a signal could 
be considered. In this case, j!4[ l gives the signal whose STFT has 
minimal least squares error with S* [n, k] . 

The theoretical existence and the mathematical definition of the 
canonical dual frame for reduced multi-frames like the one we 
employ has recently been provided in [12]: the analysis and re- 
synthesis framework is thus entirely defined within the Gabor the- 
ory, but no automatic adaptation is employed. We are at present 
working on the interesting analogies between the two approaches, 
to establish a unified interpretation and develop further extensions. 



4. CONCLUSIONS 

We have presented an algorithm for time-adaptation of the spectro- 
gram resolution, which can be easily integrated in existent frame- 
work for analysis, transformation and re-synthesis of an audio sig- 
nal: the adaptation is locally obtained through an entropy mini- 
mization within a finite set of resolutions, which can be defined by 
the user or left as default. The user can also specify the time dura- 
tion and overlap of the analysis segments where entropy minimiza- 
tion is performed, to privilege more or less discontinuous adapted 
analyses. 

Future improvements of this method will concern the spectrogram 
adaptation in both time and frequency dimensions: this will pro- 
vide a decomposition of the signal in several layers of analysis 
frames, thus requiring an extension of the proposed technique for 
re-synthesis. 
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Figure 3: Example of an adaptive analysis performed by our algorithm with eight Hanning windows of different sizes from 512 to 4096 
samples, on a B4 note played by a marimba sampled at 44.1kHz: on top, the best window chosen as a function of time; at the bottom, the 
adaptive spectrogram. The entropy order is ol = 0.7 and each analysis segment contains four frames of the largest window analysis with a 
two-frames overlap between consequent segments. 
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Figure 4: Two different spectrograms of a sinusoid with sinusoidal frequency modulation, with Hanning windows of sizes 512 (top) and 
4096 (bottom) samples. 




Figure 5: Example of an adaptive analysis performed by our algorithm with eight Hanning windows of different sizes from 512 to 4096 
samples, on a sinusoid with sinusoidal frequency modulation synthesized at 44.1 kHz: on top, the best window chosen as a function of 
time; at the bottom, the adaptive spectrogram. The entropy order is a = 0.7 and each analysis segment contains four frames of the largest 
window analysis with a three-frames overlap between consequent segments. 
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